Nano Banana
Generate high-quality presentation slides as images using Gemini's image generation API, review them interactively in a browser, and iteratively edit based on feedback.
When to Use This Skill
- User asks to create a presentation, slide deck, or PPT
- User wants to generate visual slides for a talk or lecture
- User has a document or outline and wants slides based on it
- User says "make me a PPT", "generate slides", "create a presentation"
- User wants to edit or refine existing generated slides
- User needs high-quality figures, diagrams, or illustrations for papers or documents
- User asks to generate research figures, architecture diagrams, or concept illustrations
Do NOT use for:
- Writing academic papers → use
- Planning academic conference talk narrative structure → use
Before You Start: Prerequisites
Before proceeding with any slide generation, verify these prerequisites:
-
API Key: Check that a Google API key is available. Run:
If empty, ask the user to provide one. They can either:
- Set it via config:
EvoSci config set google_api_key <key>
- Provide it directly (pass via argument)
- If the user provides the key in conversation, pass it to scripts with
-
Language: Ask the user what language the slide content should be in. This affects the content you write in
, not the style template.
Core Workflow
Phase 1: Content Planning Conversation ← most important phase
Phase 2: Generate slides_plan.json
Phase 3: Select Style & Generate Slides
Phase 4: Launch Review Server
Phase 5: Apply Feedback Edits ← repeat Phase 4-5 until satisfied
Phase 6: Package as PPTX
Phase 7: Cleanup
Follow these phases in order. Do NOT skip Phase 1 — the quality of generated slides depends directly on planning depth.
Phase 1: Content Planning Conversation
This is the most critical phase. Rushing to generation without proper planning produces mediocre slides. Engage the user in a structured conversation:
Step 1 — Understand the context:
- What is the topic of the presentation?
- Who is the audience? (technical peers, executives, students, general public)
- How long is the talk? (this determines page count)
- What is the occasion? (conference, internal talk, lecture, pitch)
Step 2 — Define the storyline:
- What is the opening hook? (a surprising fact, a question, a trend)
- What are the 3-5 main sections or arguments?
- What is the key takeaway the audience should remember?
- What is the closing message?
Step 3 — Outline per-page content:
- For each slide, agree on: title + 2-4 key points + visual description
- Identify which slides are cover, content, or data type
- Ensure logical flow between pages
Duration-to-page-count guidance:
| Duration | Pages | Structure |
|---|
| 5 min | 5 | Cover + 3 content + closing |
| 10-15 min | 8-12 | Cover + intro + 3-4 sections + summary + closing |
| 20-30 min | 15-20 | Cover + intro + 5-6 sections + summary + closing |
| 45-60 min | 25-30 | Cover + intro + 7-9 sections (2-3 pages each) + summary + closing |
If the user provides a document or outline, read it thoroughly, then propose a slide breakdown for approval before proceeding.
Phase 2: Generate slides_plan.json
Create a
file in the workspace root with this schema:
json
{
"title": "Presentation Title",
"total_slides": 10,
"slides": [
{
"slide_number": 1,
"page_type": "cover",
"content": "Title: My Presentation\nSubtitle: A subtitle here\nLabel: 2026 Edition"
},
{
"slide_number": 2,
"page_type": "content",
"content": "Title: First Topic\nKey points:\n- Point one\n- Point two\n- Point three"
},
{
"slide_number": 3,
"page_type": "data",
"content": "Title: Key Metrics\nMetric 1: 95% accuracy\nMetric 2: 3x faster\nMetric 3: 10k users"
}
]
}
Critical Content Field Rules
The
field is what gets passed to the image generation model. Follow these rules strictly:
- DO write descriptive titles and bullet points
- DO describe the visual layout you want (e.g., "left-right comparison", "4 icon cards")
- DO NOT prefix lines with "Slogan:", "Visual:", "Points:", or any meta-labels — the model will render these as visible text on the slide
- DO NOT put the same sentence in both the title area and the bottom of the content — it causes duplication
- DO NOT include footer text, page numbers, or watermark instructions
Bad example (meta-labels leak as visible text):
Title: Why AI Matters
Visual: left-right comparison chart
Points:
- Point one
- Point two
Slogan: AI changes everything
Good example (clean, no meta-labels):
Title: Why AI Matters
Visual layout: left-right comparison chart showing traditional vs AI approach
Key points:
- Point one with brief explanation
- Point two with brief explanation
Bottom tagline: AI changes everything
Phase 3: Select Style & Generate Slides
Available Styles
| Style | File | Visual Characteristics | Best For |
|---|
| Lineal Color | | White background, teal accents, flat 2D icons, info cards | Technical talks, lectures, educational |
| Gradient Glass | | Light pastel background, frosted glass cards, Apple Keynote feel | Product launches, pitches, SaaS |
| Vector Illustration | styles/vector-illustration.md
| Cream background, black outlines, retro colors, toy-model charm | Educational, children's content, brand stories |
Present the styles to the user and let them choose. If unsure, recommend Lineal Color as the default.
Available Models
| Model | Speed | Quality | When to Use |
|---|
gemini-3-pro-image-preview
| Moderate | Best | Final version, important presentations |
gemini-3.1-flash-image-preview
| Fast | Good | Drafts, rapid iteration, large decks |
| Fastest | Basic | Quick prototypes, bulk generation |
For first-time generation, recommend
gemini-3.1-flash-image-preview
(fast iteration). Switch to
gemini-3-pro-image-preview
for the final version.
Generate Command
bash
python /skills/nano-banana/scripts/generate_ppt.py \
--plan slides_plan.json \
--style /skills/nano-banana/styles/lineal-color.md \
--model gemini-3.1-flash-image-preview \
--output ppt_output
Arguments:
- (required): Path to slides_plan.json
- (required): Path to style template
- : Image generation model (default:
gemini-3-pro-image-preview
)
- : (default) or
- : Output directory (default: )
- : Google API key (if not in environment)
- : Number of parallel workers (default: 1, recommended: 3-5 for large decks)
Output structure:
ppt_output/
├── images/
│ ├── slide-01.png
│ ├── slide-02.png
│ └── ...
├── prompts.json # All prompts used (for debugging)
└── index.html # Browser viewer
Phase 4: Launch Review Server
Start the interactive review server so the user can review slides and write feedback:
bash
python /skills/nano-banana/scripts/serve_viewer.py \
--dir ppt_output \
--plan slides_plan.json \
--port 8080 \
--pid-file .viewer.pid
Tell the user:
Review server is running at
http://localhost:8080. Open it in your browser to review each slide. Write feedback in the text box below any slide that needs changes, then click "Save Feedback". Tell me when you're done.
The server saves feedback directly into
as a
field on each slide.
Wait for the user to confirm they have saved their feedback before proceeding.
Phase 5: Apply Feedback Edits
Read
and find all slides with a non-empty
field. For each one, run the edit script:
bash
python /skills/nano-banana/scripts/edit_slide.py \
--input ppt_output/images/slide-{NUMBER}.png \
--instruction "{FEEDBACK_TEXT}" \
--output ppt_output/images/slide-{NUMBER}.png \
--model gemini-3.1-flash-image-preview
Arguments:
- (required): Path to the original slide image
- (required): The edit instruction (from feedback field)
- : Output path (default: overwrite input)
- : Image generation model
- : Google API key (if not in environment)
After editing all slides with feedback, clear the
fields from
and tell the user to refresh the browser to see updated slides.
If the user has more feedback, repeat Phase 4-5. This review-edit cycle continues until the user is satisfied.
Phase 6: Package as PPTX
Once the user approves all slides, ask for the desired filename and package them:
bash
python /skills/nano-banana/scripts/package_pptx.py \
--dir ppt_output/images \
--output presentation.pptx \
--kill-server .viewer.pid
Arguments:
- (required): Directory containing slide-XX.png images
- (required): Output .pptx file path
- : PID file from serve_viewer.py — automatically stops the review server after packaging
Phase 7: Cleanup
- The review server is automatically stopped by
package_pptx.py --kill-server
- Ask the user if they want to keep directory or clean it up
- The can be kept for future re-generation
Counterintuitive Rules
-
Never include meta-labels in content — Words like "Slogan:", "Visual:", "Points:" will be rendered as visible text on the slide. Describe what you want without prefixes.
-
Content describes WHAT, not HOW — The style template handles visual layout. The content field should focus on text and logical structure, not colors or positioning.
-
More planning = better slides — Spending 10 minutes on Phase 1 conversation saves hours of re-generation. Do not rush to Phase 3.
-
Edit, don't regenerate — When a slide needs minor changes (text fix, color change, remove footer), use
instead of regenerating from scratch. Editing preserves visual consistency.
-
Use flash model for drafts —
gemini-3.1-flash-image-preview
is fast enough for iteration. Only switch to
gemini-3-pro-image-preview
for the final version after all feedback is addressed.
-
Never read generated images yourself — Not all models support multimodal input. Do NOT use
on generated PNG images to check quality. Always launch the review server and let the user inspect slides visually in the browser. The user's feedback is your only quality signal.
-
One idea per slide — Do not pack multiple concepts into a single slide. If a slide has more than 4 bullet points, split it into two slides.
-
Bottom taglines should not repeat the title — If the title says "Why AI Matters", the bottom tagline should add new insight, not restate the title.
Scripts Reference
| Script | Purpose | Key Arguments |
|---|
| Batch generate all slides from plan | , , , , , , |
| Edit a single slide based on instruction | , , , , |
| Local review server with feedback | , , , , |
| Package slide images into .pptx | , , |
Style Template Format
Style templates are markdown files in
with a fixed structure that
parses:
| Section | Purpose | Parsed by Code |
|---|
| Visual specifications shared by all slides | Yes — injected into every prompt |
| Layout descriptions per page type | Fallback only |
| Actual prompt templates with and placeholders | Yes — primary templates |
| Other sections | Documentation only | No |
To create a new style: copy an existing
file, modify the
and
sections. The code extracts
,
, and
code blocks from
.