image-gen
Image generation, editing, and review via OpenRouter API. Five models from budget to premium. Style presets for series consistency, JSON structured prompts, reference image anchoring, system message support, prompt upsampling. Vision-based quality review loop. Zero dependencies b
What it does
image-gen
Image generation and editing via OpenRouter. Five models, three scripts, style presets, one JSON contract.
Scripts: ./scripts/generate.py, ./scripts/edit.py, ./scripts/review.py
Presets: ./presets/*.json
Output dir: ./data/
Setup
export OPENROUTER_API_KEY_IMAGES='your-api-key-here'
- Claude Code: copy this skill folder into
.claude/skills/image-gen/ - Codex CLI: append this SKILL.md content to your project's root
AGENTS.md
For the full installation walkthrough (prerequisites, API keys, verification, troubleshooting), see references/installation-guide.md.
Credential management
Three tiers for managing the OPENROUTER_API_KEY_IMAGES environment variable:
- Vault skill (recommended): If you have a vault or secret-management skill, store the key there and export it before running scripts. Example:
export OPENROUTER_API_KEY_IMAGES=$(vault get OPENROUTER_API_KEY_IMAGES) - Custom secret manager: Use your team's preferred secret manager (1Password CLI, AWS Secrets Manager, etc.)
- Plain export:
export OPENROUTER_API_KEY_IMAGES='your-api-key-here'in your shell profile
Optional keys for additional features:
OPENAI_API_KEY-- for mask-based inpainting viaedit.py --mode openaiANTHROPIC_API_KEY-- for auto-review viareview.py --auto
Model Selection
What do you need?
|
+-- Fast + cheap + good enough?
| --> nanobanana (~$0.0004/image)
|
+-- High quality, no text?
| --> flux.2-pro (best visual quality)
|
+-- Text in the image?
| --> gpt-5-image (best text rendering)
|
+-- Image editing?
| +-- Describe changes in words --> gpt-5-image or nanobanana-pro
| +-- Paint mask area to change --> edit.py --mode openai
|
+-- Budget generation at scale?
| --> flux.2-klein (fastest, cheapest Flux)
|
+-- Quality + editing + reasoning?
--> nanobanana-pro (best balance)
| Alias | Type | Cost | Best For |
|---|---|---|---|
flux.2-pro | Image-only | ~$0.03/MP | Default high-quality generation |
flux.2-klein | Image-only | ~$0.014/MP | Fast, budget generation |
gpt-5-image | Text+Image | ~$0.04/image | Text rendering, complex edits |
nanobanana-pro | Text+Image | ~$0.012/image | Balanced quality + editing |
nanobanana | Text+Image | ~$0.0004/image | Lowest-cost generation |
Full comparison: ./references/model-card.md
Quick Reference
# Generate with default model (flux.2-pro)
python ./scripts/generate.py \
--prompt "A red fox in snow" \
--output-dir ./data/
# Generate with style preset
python ./scripts/generate.py \
--prompt "A scene description for consistent series" \
--preset default \
--output-dir ./data/
# Generate with style reference image
python ./scripts/generate.py \
--prompt "A new scene" \
--style-ref /path/to/golden-image.png \
--output-dir ./data/
# Generate with multiple style refs
python ./scripts/generate.py \
--prompt "Scene desc" \
--style-ref /path/to/ref1.png \
--style-ref /path/to/ref2.png
# Generate with system message (GPT-5 / NanoBanana only)
python ./scripts/generate.py \
--prompt "Scene desc" \
--model gpt-5-image \
--system-prompt "You generate muted watercolor illustrations..."
# Generate with prompt upsampling disabled
python ./scripts/generate.py \
--prompt "Exact scene" \
--model flux.2-pro \
--no-prompt-upsampling
# Generate with options (model, aspect ratio, size)
python ./scripts/generate.py \
--prompt "Tokyo skyline at sunset" \
--model nanobanana-pro \
--aspect-ratio 16:9 \
--size 2K \
--output-dir ./data/
# Generate with text (GPT-5 Image)
python ./scripts/generate.py \
--prompt 'Poster with text "HELLO WORLD" in bold sans-serif typography' \
--model gpt-5-image \
--output-dir ./data/
# Edit image (chat-based)
python ./scripts/edit.py \
--mode openrouter \
--input-image ./data/input.png \
--prompt "Change the background to a sunset beach" \
--model gpt-5-image \
--output-dir ./data/
# Edit image (mask-based)
python ./scripts/edit.py \
--mode openai \
--input-image ./data/input.png \
--mask ./data/mask.png \
--prompt "Replace masked area with a small bonsai tree" \
--openai-size 1024x1024 \
--output-dir ./data/
# Review quality (auto mode)
python ./scripts/review.py \
--image ./data/output.png \
--original-prompt "A red fox in snow" \
--auto
Style Presets
Presets encode visual identity into reusable JSON files. A preset defines palette, composition, rendering style, model defaults, and system messages.
How Presets Work
- Pick a preset based on the project context
generate.py --preset <name>loads the preset JSON- The script applies the preset: enhances the prompt with style data, selects model defaults, injects system messages
- For Flux models: prompt is constructed as JSON (structured prompt, prevents concept bleeding)
- For GPT-5/NanoBanana: style block is prepended as natural language, system message is injected
Available Presets
| Preset | File | Description |
|---|---|---|
default | presets/default.json | No style constraints. Quality-focused defaults. |
Preset Schema
{
"name": "preset-name",
"description": "What this preset is for",
"defaults": {
"model": "flux.2-pro",
"aspect_ratio": "3:2",
"size": "2K"
},
"style": {
"description": "Overall style description",
"color_palette": ["#hex1", "#hex2", "#hex3"],
"mood": "Emotional tone",
"lighting": "Lighting description",
"composition": "Composition rules",
"rendering": "Rendering constraints",
"camera": {"angle": "...", "framing": "..."},
"anti_patterns": ["thing to avoid", "another thing"],
"reference_images": ["/absolute/path/to/golden.png"]
},
"system_message": "System prompt for GPT-5/NanoBanana models"
}
Priority Order (CLI > Preset > Hardcoded)
--modelflag overridespreset.defaults.model--aspect-ratioflag overridespreset.defaults.aspect_ratio--sizeflag overridespreset.defaults.size--system-promptflag overridespreset.system_message- If no preset and no flag: hardcoded defaults (flux.2-pro, 1:1, 2K)
Creating a New Preset
- Copy
presets/default.jsonas a template - Set
nameanddescription - Fill
defaultswith preferred model, aspect ratio, size - Fill
stylewith palette (HEX values), composition rules, rendering constraints - Write
system_messagefor GPT-5/NanoBanana (ignored by Flux) - Optionally add
reference_imagespaths for visual anchoring - Test:
python generate.py --prompt "test scene" --preset your-preset
Style References
Use --style-ref to pass reference images for visual anchoring. The script prepends a style transfer instruction automatically.
# Single reference (anchor to a "golden" image)
python ./scripts/generate.py --prompt "New scene" \
--style-ref /path/to/golden.png
# Multiple references (combine style + content refs, up to 8)
python ./scripts/generate.py --prompt "New scene" \
--style-ref /path/to/style-ref.png \
--style-ref /path/to/character-ref.png
Reference images from presets (style.reference_images) are automatically loaded alongside CLI refs.
Full workflow and per-model consistency techniques: ./references/style-consistency.md
System Messages
Use --system-prompt to set persistent style context for GPT-5 Image and NanoBanana models. System messages are injected as the system role, keeping the user prompt focused on scene content only.
python ./scripts/generate.py \
--prompt "A quiet village at dawn" \
--model gpt-5-image \
--system-prompt "You generate muted watercolor illustrations with earth-tone palettes..."
System messages can also be set in presets via the system_message field. CLI --system-prompt overrides preset system messages.
Flux models do not support system messages (the flag is silently ignored).
Prompt Upsampling
Flux models support prompt_upsampling -- an API feature that auto-enhances basic prompts into richer descriptions before generation.
- Default: ON for Flux models (flux.2-pro, flux.2-klein)
- Disable:
--no-prompt-upsamplingwhen you want exact prompt control - Enable:
--prompt-upsampling(explicit, same as default) - Non-Flux models: Flag is silently ignored
# Default: upsampling ON (good for short/simple prompts)
python ./scripts/generate.py --prompt "A cat" --model flux.2-pro
# Disable: exact prompt control (good for precise/pre-enhanced prompts)
python ./scripts/generate.py --prompt "Detailed exact scene..." \
--model flux.2-pro --no-prompt-upsampling
Prompt Engineering
Enhance-before-generate protocol
- Identify intent: photo, illustration, logo, diagram, etc.
- Select preset if this is series work.
- Fill missing details: subject, style, lighting, composition, camera/look.
- Tune prompt shape to model: concise for Flux, more explicit for GPT-5/NanoBanana family.
- Add quality boosters that match requested style.
Example: raw prompt -> enhanced prompt
- Raw prompt:
a cat in space - Enhanced prompt:
Orange tabby cat in a custom spacesuit floating in zero gravity, Earth visible in helmet reflection, dramatic rim lighting, cinematic framing, ultra-detailed, high contrast, clean background
Templates, per-model formats, JSON structured prompts, and examples: ./references/prompt-templates.md
Generation Workflow
1. Select Preset (if applicable)
If the image is part of a series or project with established visual identity, select the appropriate preset.
2. Enhance Prompt
Transform the raw prompt into a model-optimized prompt. Each model interprets prompts differently:
- Flux: Concise, front-loaded, under 80 words. JSON for complex multi-element scenes. HEX colors for precision.
- GPT-5 Image: Detailed natural language. System messages for series work. Text in quotes for rendering.
- NanoBanana: Structured narrative. Constrain palette explicitly for muted styles ("no neon, desaturated").
3. Generate
python ./scripts/generate.py \
--prompt "[enhanced prompt]" \
--model [chosen model] \
--preset [preset if applicable] \
--aspect-ratio [ratio] \
--size [1K|2K|4K]
Output: JSON to stdout with path, model, cost_estimate, generation_time_ms.
When a preset is used, the script handles prompt enhancement internally. Enhance the scene description but do NOT manually add style parameters -- the preset handles that.
4. Review (Optional)
For quality-critical work:
python ./scripts/review.py \
--image /path/to/output.png \
--original-prompt "[the enhanced prompt]" \
--auto
Output: JSON with score, verdict (accept/refine/reject), critique, suggested_refinement.
accept(score >= 7): Deliver to userrefine(score 4-6): Re-generate withsuggested_refinementappliedreject(score < 4): Re-generate with different model or reworked prompt
5. Deliver
Return the file path and a brief description.
Editing Workflow
Chat-Based Editing (OpenRouter)
For natural language edits: "make it darker", "change background to beach", "add a hat".
python ./scripts/edit.py --mode openrouter \
--input-image /path/to/original.png \
--prompt "Change the background to a sunset beach" \
--model nanobanana-pro
Best models for chat editing: gpt-5-image, nanobanana-pro.
Mask-Based Inpainting (OpenAI Direct)
For precise area editing with a mask image.
python ./scripts/edit.py --mode openai \
--input-image /path/to/original.png \
--mask /path/to/mask.png \
--prompt "A fluffy orange cat" \
--openai-size 1024x1024
Requires OPENAI_API_KEY. Mask: PNG with transparent areas where edits should happen.
Script Output Contract
All scripts output JSON to stdout. Parse JSON, never text.
generate.py success
{
"success": true,
"path": "/abs/path/to/20260217-143052-flux-2-pro.png",
"all_paths": ["/abs/path/to/20260217-143052-flux-2-pro.png"],
"model": "black-forest-labs/flux.2-pro",
"prompt": "the user prompt",
"enhanced_prompt": "the preset-enhanced prompt (if different)",
"preset": "my-preset",
"aspect_ratio": "16:9",
"size": "2K",
"cost_estimate": "~$0.120",
"generation_time_ms": 8500,
"image_count": 1,
"style_refs": ["/path/to/ref.png"],
"system_message_used": true
}
generate.py / edit.py error
{
"success": false,
"error": "HTTP 429: Too Many Requests",
"details": "Rate limit exceeded...",
"model": "black-forest-labs/flux.2-pro",
"generation_time_ms": 150
}
review.py output (auto mode)
{
"success": true,
"mode": "auto",
"score": 8,
"prompt_adherence": 9,
"technical_quality": 8,
"composition": 7,
"verdict": "accept",
"critique": "Strong prompt adherence with vivid colors. Minor composition issue with empty right third.",
"suggested_refinement": ""
}
CLI Flags Reference
generate.py
| Flag | Default | Description |
|---|---|---|
--prompt | required | Generation prompt |
--model | preset or flux.2-pro | Model alias or full OpenRouter ID |
--aspect-ratio | preset or 1:1 | 1:1, 2:3, 3:2, 3:4, 4:3, 4:5, 5:4, 9:16, 16:9, 21:9 |
--size | preset or 2K | 1K, 2K, 4K |
--output-dir | ./data/ | Output directory |
--output-file | auto-named | Explicit output path |
--seed | none | Random seed (model-dependent) |
--input-image | none | Input image for editing |
--preset | none | Style preset name or path to JSON file |
--style-ref | none | Style reference image path (repeatable, up to 8) |
--prompt-upsampling | on for Flux | Enable Flux prompt_upsampling |
--no-prompt-upsampling | Disable prompt_upsampling | |
--system-prompt | preset or none | System message for GPT-5/NanoBanana |
edit.py
| Flag | Default | Description |
|---|---|---|
--mode | required | openrouter or openai |
--input-image | required | Image to edit |
--prompt | required | Edit instruction |
--mask | none | Mask PNG (openai mode) |
--model | gpt-5-image | Model for openrouter mode |
--openai-model | gpt-image-1 | Model for openai mode |
--openai-size | none | Size for openai mode |
--openai-quality | none | Quality for openai mode |
--output-dir | ./data/ | Output directory |
--output-file | auto-named | Explicit output path |
review.py
| Flag | Default | Description |
|---|---|---|
--image | required | Image to review |
--original-prompt | required | Prompt used for generation |
--auto | false | Auto-review via Anthropic API |
Cost Tracking
Report cost after every generation. Use the cost_estimate field from script output.
| Model | Typical Cost |
|---|---|
| NanoBanana | $0.0004 |
| Flux 2 Klein | $0.01-0.03 |
| NanoBanana Pro | $0.01-0.03 |
| Flux 2 Pro | $0.03-0.12 |
| GPT-5 Image | $0.03-0.10 |
Tip: for batch generation (5+ images), prefer NanoBanana or Flux 2 Klein to control spend.
Error Handling
| Error | Action |
|---|---|
| No API key | Script exits with JSON error. Check env vars. |
| HTTP 429 (rate limit) | Wait 10s, retry. If persistent, switch model. |
| HTTP 402 (no credits) | Top up OpenRouter account. |
| No images in response | Check model supports image output. Try different model. |
| Timeout (>180s) | Model may be overloaded. Try Klein or NanoBanana for speed. |
| Image quality too low | Run review loop. Refine prompt or switch to higher-quality model. |
| Preset not found | Check presets/ directory. Use preset name without .json extension. |
Anti-Patterns
| Do NOT | Do Instead |
|---|---|
| Send raw user prompts to models | Always enhance prompts first |
| Use Flux for text in images | Use GPT-5 Image or NanoBanana Pro |
| Use GPT-5 for bulk generation | Use NanoBanana or Flux Klein (10-100x cheaper) |
| Skip cost reporting | Always report estimated cost |
| Retry same prompt on failure | Rework prompt or switch model |
| Use review loop for casual requests | Reserve for quality-critical work |
| Forget to set API key before running | Export required keys before running scripts |
| Use JSON prompts for GPT-5 Image | GPT-5 prefers natural language; JSON adds no benefit |
| Use verbal color descriptions with Flux | Use HEX values in palette for precise control |
| Generate series images without a preset | Create a preset for any 3+ image series |
| Forget --style-ref for consistency | Use golden image as reference for series work |
Bundled Resources Index
| Path | What | When to Load |
|---|---|---|
./scripts/generate.py | Core image generation script | Every generation task |
./scripts/edit.py | Chat-based and mask-based editing | Image modification requests |
./scripts/review.py | Vision-based quality review | Quality-critical workflows |
./presets/default.json | Default preset (no style constraints) | Reference for preset schema |
./references/prompt-templates.md | SOTA prompt engineering: per-model formats, JSON templates, style modifiers, enhancement protocol | Prompt engineering step |
./references/style-consistency.md | Reference image workflow, seed workflow, consistency technique stack | Generating image series requiring visual consistency |
./references/model-card.md | Model capabilities, tradeoffs, pricing context | Model selection and optimization |
./references/installation-guide.md | Detailed install walkthrough for Claude Code and Codex CLI | First-time setup or environment repair |
./references/api-reference.md | API payload and integration details | Debugging and advanced usage |
./references/book-to-prompts-playbook.md | Method for extracting visual prompts from literature | Book illustration projects |
./examples/ | Example outputs by model | Visual quality and style calibration |
./UPDATES.md | Changelog for this skill | Checking new features/fixes |
./UPDATE-GUIDE.md | Agent-oriented update instructions | Applying updates safely |
Staying Updated
This skill ships with an UPDATES.md changelog and UPDATE-GUIDE.md for your AI agent.
After installing, tell your agent: "Check UPDATES.md in the image-gen skill for any new features or changes."
When updating, tell your agent: "Read UPDATE-GUIDE.md and apply the latest changes from UPDATES.md."
Follow UPDATE-GUIDE.md so customized local files are diffed before any overwrite.
To check upstream updates directly from GitHub:
curl -fsSL https://raw.githubusercontent.com/buildoak/fieldwork-skills/main/skills/image-gen/UPDATES.md | head -40
Capabilities
Install
Quality
deterministic score 0.46 from registry signals: · indexed on github topic:agent-skills · 15 github stars · SKILL.md body (18,882 chars)