Nano Banana - AI Image Generation

Generate and edit images using Google Gemini models. Supports two models:

Pro (
```
gemini-3-pro-image-preview
```
) — High quality, complex prompts, thinking mode
Flash (
```
gemini-2.5-flash-image
```
) — Fast, cheap, good for iteration

Prerequisites

Required:

```
GEMINI_API_KEY
```
— Get from Google AI Studio
```
uv
```
(recommended) or Python 3.10+ with
```
google-genai
```
installed

With
uv
(recommended — zero setup): Dependencies are declared inline via PEP 723 and auto-installed on first run. Just use

uv run

instead of

python3

With pip (fallback):

bash

pip install -r <skill_dir>/requirements.txt

Quick Start

Default output: Images save to

~/Downloads/nanobanana_<timestamp>.png

automatically. Do NOT pass

-o

unless the user specifies where to save. If the user provides a filename without a directory (e.g., "save it as robot.png"), use

-o ~/Downloads/robot.png

Generate an image:

bash

uv run <skill_dir>/scripts/generate.py "a cute robot mascot, pixel art style"

Edit an existing image:

bash

uv run <skill_dir>/scripts/generate.py "make the background blue" -i input.jpg

Use Flash model for fast iteration:

bash

uv run <skill_dir>/scripts/generate.py "quick sketch of a cat" --model flash

Multi-image reference (style + subject):

bash

uv run <skill_dir>/scripts/generate.py "apply the style of the first image to the second" \
  -i style_ref.png subject.jpg

Generate with specific aspect ratio and resolution:

bash

uv run <skill_dir>/scripts/generate.py "cinematic landscape" --ratio 21:9 --size 4K

Save to a specific location:

bash

uv run <skill_dir>/scripts/generate.py "logo design" -o ~/Projects/brand/logo.png

Model Selection Guide

	Pro (default)	Flash
Speed	Slower	~2-3x faster
Cost	Higher	Lower
Text rendering	Good	Unreliable
Complex scenes	Excellent	Adequate
Thinking mode	Yes	No
Best for	Final production images	Exploration, drafts, batch

Rule of thumb: Use Flash for exploration and batch generation, Pro for final output.

Script Reference

scripts/generate.py

Main image generation script.

Usage: generate.py [OPTIONS] PROMPT

Arguments:
  PROMPT                Text prompt for image generation

Options:
  -o, --output PATH     Output file path (default: ~/Downloads/nanobanana_<timestamp>.png)
  -i, --input PATH...   Input image(s) for editing / reference (up to 14)
  -m, --model MODEL     Model: 'pro' (default), 'flash', or full model ID
  -r, --ratio RATIO     Aspect ratio (1:1, 16:9, 9:16, 21:9, etc.)
  -s, --size SIZE       Image size: 1K, 2K, or 4K (default: standard)
  --search              Enable Google Search grounding for accuracy
  --retries N           Max retries on rate limit (default: 3)
  -v, --verbose         Show detailed output

Supported aspect ratios:

```
1:1
```
— Square (default)
```
2:3
```
,
```
3:2
```
— Portrait/Landscape
```
3:4
```
,
```
4:3
```
— Standard
```
4:5
```
,
```
5:4
```
— Photo
```
9:16
```
,
```
16:9
```
— Widescreen
```
21:9
```
— Ultra-wide/Cinematic

Image sizes:

```
1K
```
— Fast, lower detail
```
2K
```
— Enhanced detail (2048px)
```
4K
```
— Maximum quality (3840px), best for text rendering

scripts/batch_generate.py

Generate multiple images with sequential naming.

Usage: batch_generate.py [OPTIONS] PROMPT

Arguments:
  PROMPT                Text prompt for image generation

Options:
  -n, --count N         Number of images to generate (default: 10)
  -d, --dir PATH        Output directory (default: ~/Downloads)
  -p, --prefix STR      Filename prefix (default: "image")
  -m, --model MODEL     Model: 'pro' (default), 'flash', or full model ID
  -r, --ratio RATIO     Aspect ratio
  -s, --size SIZE       Image size (1K/2K/4K)
  --search              Enable Google Search grounding
  --retries N           Max retries per image on rate limit (default: 3)
  --delay SECONDS       Delay between generations (default: 3)
  --parallel N          Concurrent requests (default: 1, max recommended: 5)
  -q, --quiet           Suppress progress output

Example:

bash

uv run <skill_dir>/scripts/batch_generate.py "pixel art logo" -n 20 --model flash -d ./logos -p logo

Python API

Direct import (from another skill's script):

Note: When importing as a Python module,
google-genai
must be available in the calling script's environment. If using
uv run
, add a PEP 723
dependencies
block to your own script (see example in Pattern 2 below).

python

import sys
from pathlib import Path
sys.path.insert(0, str(Path("<skill_dir>/scripts")))
from generate import generate_image, edit_image, batch_generate

# Generate image
result = generate_image(
    prompt="a futuristic city at night",
    output_path="city.png",
    aspect_ratio="16:9",
    image_size="4K",
    model="pro",
)

# Edit existing image
result = edit_image(
    prompt="add flying cars to the sky",
    input_path="city.png",
    output_path="city_edited.png",
)

# Multi-image reference
result = generate_image(
    prompt="combine the color palette of the first with the composition of the second",
    input_paths=["palette_ref.png", "composition_ref.png"],
    output_path="combined.png",
)

Return structure (always present):

python

{
    "success": True,       # or False
    "path": "/path/to/output.png",  # or None on failure
    "error": None,         # or error message string
    "metadata": {
        "model": "gemini-3-pro-image-preview",
        "prompt": "...",
        "aspect_ratio": "16:9",
        "image_size": "4K",
        "use_search": False,
        "input_images": None,        # or list of paths
        "text_response": "...",      # optional text from model
        "thinking": "...",           # Pro model reasoning (when available)
        "timestamp": "2025-01-26T...",
    }
}

Downstream Skill Integration Guide

Pattern 1: CLI wrapper (recommended for simple use)

bash

# In your skill's script:
uv run <nanobanana_dir>/scripts/generate.py "{prompt}" --model flash --ratio 16:9 -o output.png

Pattern 2: Python import with custom defaults

python

# /// script
# requires-python = ">=3.10"
# dependencies = [
#     "google-genai>=1.0.0",
# ]
# ///

import sys
from pathlib import Path

NANOBANANA_DIR = Path("<nanobanana_dir>/scripts")
sys.path.insert(0, str(NANOBANANA_DIR))
from generate import generate_image

def generate_thumbnail(prompt: str, output_path: str) -> dict:
    """Generate a YouTube thumbnail with project defaults."""
    return generate_image(
        prompt=prompt,
        output_path=output_path,
        aspect_ratio="16:9",
        image_size="2K",
        model="flash",
        max_retries=3,
    )

Pattern 3: Batch with progress tracking

python

from batch_generate import batch_generate

def on_progress(completed, total, result):
    print(f"Progress: {completed}/{total}")

results = batch_generate(
    prompt="logo concept",
    count=20,
    output_dir="./logos",
    prefix="logo",
    model="flash",
    aspect_ratio="1:1",
    on_progress=on_progress,
)

successful = [r for r in results if r["success"]]

Pattern 4: Sequential generation for series

When a downstream skill needs multiple consistently-styled images (e.g., newsletter visuals, thumbnail A/B variants), use the anchor-and-reference pattern:

python

from generate import generate_image

# Step 1: Generate the style anchor
anchor = generate_image(
    prompt="warm illustration style, earth tones, soft gradients, clean lines",
    output_path="anchor.png",
    model="pro",
)

# Step 2: Generate each image in the series, referencing the anchor
subjects = ["laptop on desk with coffee", "person reading a book", "sunrise over mountains"]
series_paths = [anchor["path"]]

for i, subject in enumerate(subjects):
    result = generate_image(
        prompt=f"{subject}, matching the visual style and color palette of the reference image exactly",
        input_paths=[anchor["path"]],  # always include the anchor
        output_path=f"series_{i+1:02d}.png",
        model="pro",
    )
    if result["success"]:
        series_paths.append(result["path"])

The full sequential generation patterns are documented in the Sequential Generation section above.

Environment Variables

Variable	Description	Default
`GEMINI_API_KEY`	Google Gemini API key	Required
`IMAGE_OUTPUT_DIR`	Default output directory	`~/Downloads`

Features

Text-to-Image Generation

Create images from text descriptions. Both models excel at:

Photorealistic images
Artistic styles (pixel art, illustration, etc.)
Product photography
Landscapes and scenes

Image Editing

Transform existing images with natural language:

Style transfer
Object addition/removal
Background changes
Color adjustments

Multi-Image Reference

Provide up to 14 reference images for:

Style consistency across a series
Subject consistency (same character, different poses)
Brand-consistent generation
Style + subject combination

High-Resolution Output

1K — Fast generation, good for drafts
2K — Enhanced detail (2048px)
4K — Maximum quality (3840px), best for text rendering

Google Search Grounding

Enable

--search

for factually accurate images involving:

Real people, places, landmarks
Current events
Specific products or brands

Automatic Retry

Rate limit errors are automatically retried with exponential backoff (default: 3 retries). No action needed from callers.

SynthID Watermark Notice

All images generated by Gemini contain an invisible SynthID digital watermark. This is automatic, cannot be disabled, and survives common transformations (resize, crop, compression). Be aware of this for any use case requiring watermark-free output.

Sequential Generation

Use sequential generation to maintain visual consistency across a series of images. The core technique: generate an anchor image first, then pass it as a reference (

-i

) for every subsequent image in the series.

Pattern 1: Style-Board Anchoring

Generate a single anchor image that establishes the visual identity for a series. Reference it for all subsequent images.

When to use: Newsletter visual series, A/B thumbnail variants, brand-consistent image batches.

Workflow:

Generate the anchor image with a prompt emphasizing style, palette, and mood:

bash

uv run <skill_dir>/scripts/generate.py \
  "modern flat illustration style, warm earth tones, soft gradients, clean lines, \
  minimal detail, cozy atmosphere" \
  --model pro -o anchor.png

Generate each subsequent image referencing the anchor:

bash

uv run <skill_dir>/scripts/generate.py \
  "a laptop on a desk with coffee, matching the visual style, color palette, \
  and lighting of the reference image exactly" \
  -i anchor.png --model pro -o image_01.png

Repeat step 2 for each image in the series, always referencing the same anchor.

Tip: Use Flash to draft the anchor quickly, then regenerate with Pro once you find a style you like.

Pattern 2: Subject Consistency

Keep the same character or subject looking consistent across different scenes and poses.

When to use: Mascot in multiple contexts, product photography series, recurring character.

Workflow:

Generate the initial subject with clear, detailed appearance description:

bash

uv run <skill_dir>/scripts/generate.py \
  "a friendly robot mascot with round blue body, orange antenna, large expressive eyes, \
  simple geometric design, standing front-facing on white background" \
  --model pro -o subject_front.png

Generate new scenes referencing the subject:

bash

uv run <skill_dir>/scripts/generate.py \
  "the same robot character from the reference image, now sitting at a desk typing, \
  same proportions and colors, office background" \
  -i subject_front.png --model pro -o subject_office.png

For stronger consistency, reference 2-3 of the best previous outputs:

bash

uv run <skill_dir>/scripts/generate.py \
  "the same robot character from the reference images, now outdoors in a park, \
  same proportions and colors, waving at the viewer" \
  -i subject_front.png subject_office.png --model pro -o subject_park.png

Pattern 3: Progressive Accumulation

Build a reference pool over a long series, adding each successful output as a reference for the next.

When to use: Series of 5+ images where consistency must compound across the full set.

Workflow:

Generate the anchor (same as Pattern 1, step 1).
Generate image 2 referencing the anchor.
Generate image 3 referencing anchor + image 2.
Continue, keeping the 3-4 strongest references in the
```
-i
```
list. Drop weaker outputs.

Why cap at 3-4 references: More references dilute the style signal. The model averages across all inputs — too many and the result loses coherence. Keep only the images that best represent the target style.

Reference ordering matters: Place the style anchor first in the

-i

list. The model weights earlier references slightly more.

Best Practices

Prompt Writing

Good prompts include:

Subject description
Style/aesthetic
Lighting and mood
Composition details
Color palette

See references/prompts.md for detailed prompt templates by category and model-specific tips.

Batch Generation Tips

Use
```
--model flash
```
for exploration batches (faster, cheaper)
Generate 10-20 variations to explore options
Default 3-second delay between sequential requests avoids rate limits
Review results and iterate on best candidates with Pro model

Rate Limits

Gemini API has usage quotas (~10 RPM free tier)
Automatic retry with exponential backoff handles transient rate limits
For large batches, use
```
--delay 5
```
or
```
--parallel
```
with modest concurrency
Check your quota at Google AI Studio

Troubleshooting

"uv: command not found"

Install uv:

curl -LsSf https://astral.sh/uv/install.sh | sh

brew install uv

"Error: google-genai package not installed"

Use
```
uv run
```
instead of
```
python3
```
to auto-install dependencies

Or install manually:

pip install -r <skill_dir>/requirements.txt

"GEMINI_API_KEY environment variable not set"

Set
```
GEMINI_API_KEY
```
in your environment before running

"No image in response"

Prompt may have triggered safety filters
Try rephrasing to avoid sensitive content

"Rate limit exceeded after N retries"

Wait 30-60 seconds and try again
Reduce batch parallelism or add longer delays
Check your API quota

Import errors in batch_generate.py

The script handles its own path setup; run from any directory

Future Capabilities

Multi-turn conversational editing — The Gemini API supports stateful chat sessions for iterative image editing (e.g., "make it bluer" → "now add a hat" → "zoom out"). This requires fundamentally different stateful architecture and is not currently implemented. No downstream skill currently needs this.

References

references/prompts.md — Prompt examples, model-specific tips, multi-reference patterns
references/gemini-api.md — Curated API reference for agent context

nanobanana

NPX Install

Tags

SKILL.md Content

Nano Banana - AI Image Generation

Prerequisites

Quick Start

Generate an image:

Edit an existing image:

Use Flash model for fast iteration:

Multi-image reference (style + subject):

Generate with specific aspect ratio and resolution:

Save to a specific location:

Model Selection Guide

Script Reference

scripts/generate.py

scripts/batch_generate.py

Python API

Direct import (from another skill's script):

Return structure (always present):

Downstream Skill Integration Guide

Pattern 1: CLI wrapper (recommended for simple use)

Pattern 2: Python import with custom defaults

Pattern 3: Batch with progress tracking

Pattern 4: Sequential generation for series

Environment Variables

Features

Text-to-Image Generation

Image Editing

Multi-Image Reference

High-Resolution Output

Google Search Grounding

Automatic Retry

SynthID Watermark Notice

Sequential Generation

Pattern 1: Style-Board Anchoring

Pattern 2: Subject Consistency

Pattern 3: Progressive Accumulation

Best Practices

Prompt Writing

Batch Generation Tips

Rate Limits

Troubleshooting

Future Capabilities

References

`scripts/generate.py`

`scripts/batch_generate.py`